{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Welcome to DeepTCR! We created DeepTCR to help you analyzie T-Cell Receptor Sequencing (TCRSeq) data in a structurally informed way. We created a variety of these tutorials to guide you in using DeepTCR."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first thing we will go through is how to get your TCRSeq data into DeepTCR for both unsupervised and supervised learning analysis. Generally, you will have data that is organized by TCR sequences (either alpha or beta or paired alpha/beta) with v/d/j gene usage and counts/frequencies for those sequences. We will go over two methods by which one can load data into DeepTCR."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading Data From Files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first way is to load data that is stored in TCRSeq delimited filed (like those produced by Adaptive Biotechnologies). First, it's important to put your data into a main directory with sub-directories. The main directory reflects the project of data while the subdirectories represent possible groupings/class labels of your data. An example of this formatting can be seen in the Data directory provided. Several datasets exist and can be looked for their formatting. If there is no labels for the data, just create one subdirectory with all the files in the folder. The program can either take csv or tsv files (comma,tab delimited files).\n",
    "Once the data has been prepared in the directories as directed, one can call the following command to load the data.The important inputs into this function include the directory where the data is stored, whether we want to aggregate to only unique aa (aggregatate_by_aa=True), and then the columns where our data is stored. For example, the beta chain data is located in column 0. DeepTCR can take any type of TCRSeq data as long as at least one type of data is provided.Of note, if this command has been executed previously, one can choose to se the Load_Prev_Data parameter to True to directly load the saved data quickly from an automatically stored pickle file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data Loaded\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "sys.path.append('../')\n",
    "from DeepTCR.DeepTCR import DeepTCR_U\n",
    "\n",
    "# Instantiate training object\n",
    "DTCRU = DeepTCR_U('Tutorial')\n",
    "\n",
    "#Load Data from directories\n",
    "DTCRU.Get_Data(directory='../Data/Murine_Antigens',Load_Prev_Data=False,aggregate_by_aa=True,\n",
    "               aa_column_beta=0,count_column=1,v_beta_column=2,j_beta_column=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading Data Programtically"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second way one can load data into DeepTCRis through directly inputting the data programtically. This is useful if one has sequence data from other analyses and wants to utilzie some of the DeepTCR's featurization capabilties for other analyses in part of a larger pipeline.To do this, one can load data directly into DeepTCRw ith the follwing command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data Loaded\n"
     ]
    }
   ],
   "source": [
    "beta = DTCRU.beta_sequences\n",
    "freq = DTCRU.freq\n",
    "counts = DTCRU.counts\n",
    "v_beta = DTCRU.v_beta\n",
    "d_beta = DTCRU.d_beta\n",
    "j_beta = DTCRU.j_beta\n",
    "class_labels = DTCRU.class_id\n",
    "sample_labels = DTCRU.sample_id\n",
    "DTCRU.Load_Data(beta_sequences=beta,v_beta=v_beta,d_beta=d_beta,\n",
    "                j_beta=j_beta,class_labels=class_labels,sample_labels=sample_labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of note, after data has been loaded from files, one can pull this data for other analyses by referencing the object variables as demontrated above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
